28 research outputs found

    Similarity search and mining in uncertain spatial and spatio-temporal databases

    Get PDF
    Both the current trends in technology such as smart phones, general mobile devices, stationary sensors and satellites as well as a new user mentality of utilizing this technology to voluntarily share information produce a huge flood of geo-spatial and geo-spatio-temporal data. This data flood provides a tremendous potential of discovering new and possibly useful knowledge. In addition to the fact that measurements are imprecise, due to the physical limitation of the devices, some form of interpolation is needed in-between discrete time instances. From a complementary perspective - to reduce the communication and bandwidth utilization, along with the storage requirements, often the data is subjected to a reduction, thereby eliminating some of the known/recorded values. These issues introduce the notion of uncertainty in the context of spatio-temporal data management - an aspect raising an imminent need for scalable and flexible data management. The main scope of this thesis is to develop effective and efficient techniques for similarity search and data mining in uncertain spatial and spatio-temporal data. In a plethora of research fields and industrial applications, these techniques can substantially improve decision making, minimize risk and unearth valuable insights that would otherwise remain hidden. The challenge of effectiveness in uncertain data is to correctly determine the set of possible results, each associated with the correct probability of being a result, in order to give a user a confidence about the returned results. The contrary challenge of efficiency, is to compute these result and corresponding probabilities in an efficient manner, allowing for reasonable querying and mining times, even for large uncertain databases. The paradigm used to master both challenges, is to identify a small set of equivalent classes of possible worlds, such that members of the same class can be treated as equivalent in the context of a given query predicate or data mining task. In the scope of this work, this paradigm will be formally defined, and applied to the most prominent classes of spatial queries on uncertain data, including range queries, k-nearest neighbor queries, ranking queries and reverse k-nearest neighbor queries. For this purpose, new spatial and probabilistic pruning approaches are developed to further speed up query processing. Furthermore, the proposed paradigm allows to develop the first efficient solution for the problem of frequent co-location mining on uncertain data. Special emphasis is taken on the temporal aspect of applications using modern data collection technologies. While the aforementioned techniques work well for single points of time, the prediction of query results over time remains a challenge. This thesis fills this gap by modeling an uncertain spatio-temporal object as a stochastic process, and by applying the above paradigm to efficiently query, index and mine historical spatio-temporal data.Moderne Technologien, z.B. Sattelitentechnologie und Technologie in Smart Phones, erzeugen eine Flut rĂ€umlicher Geo-Daten. Zudem ist in der Gesellschaft ein Trend zu beobachten diese erzeugten Daten freiwillig auf öffentlich zugĂ€nglichen Plattformen zur VerfĂŒgung zu stellen. Diese Datenflut hat immenses Potential, um neues und nĂŒtzliches Wissen zu entdecken. Diese Daten sind jedoch grundsĂ€tzlich unsichere rĂ€umliche Daten. Die Unsicherheit ergibt sich aus mehreren Aspekten. Zum einen kommt es bei Messungen grundsĂ€tzlich zu Messungenauigkeiten, zum anderen ist zwischen diskreten Messzeitpunkten eine Interpolation nötig, die zusĂ€tzliche Unsicherheit erzeugt. Auerdem werden die Daten oft absichtlich reduziert, um Speicherplatz und Transfervolumen einzusparen, wodurch weitere Information verloren geht. Diese Unsicherheit schafft einen sofortigen Bedarf fĂŒr skalierbare und flexible Methoden zur Verwaltung und Auswertung solcher Daten. Im Rahmen dieser Arbeit sollen effektive und effiziente Techniken zur Ähnlichkeitssuche und zum Data Mining bei unsicheren rĂ€umlichen und unsicheren rĂ€umlich-zeitlichen Daten erarbeitet werden. Diese Techniken liefern wertvolles Wissen, das auf verschiedenen Forschungsgebieten, als auch bei industriellen Anwendungen zur Entscheidungsfindung genutzt werden kann. Bei der Entwicklung dieser Techniken gibt es zwei Herausforderungen. Einerseits mĂŒssen die entwickelten Techniken effektiv sein, um korrekte Ergebnisse und Wahrscheinlichkeiten dieser Ergebnisse zurĂŒckzugeben. Andererseits mĂŒssen die entwickelten Techniken effizient sein, um auch in sehr großen Datenbanken Ergebnisse in annehmbarer Zeit zu liefern. Die Dissertation stellt ein neues Paradigma vor, das beide Herausforderungen meistert. Dieses Paradigma identifiziert mögliche Datenbankwelten, die bezĂŒglich des gegebenen AnfrageprĂ€dikats Ă€quivalent sind. Es wird formal definiert und auf die relevantesten rĂ€umlichen Anfragetypen angewendet, um effiziente Lösungen zu entwickeln. Dazu gehören Bereichanfragen, k-NĂ€chste-Nachbarnanfragen, Rankinganfragen und Reverse k-NĂ€chste-Nachbarnanfragen. RĂ€umliche und probabilistische Pruningkriterien werden entwickelt, um insignifikante Ergebnisse frĂŒh auszuschlieen. Zudem wird die erste effiziente Lösung fĂŒr das Problem des "Spatial Co-location Minings" auf unsicheren Daten prĂ€sentiert. Ein besonderer Schwerpunkt dieser Arbeit liegt auf dem temporalen Aspekt moderner Geo-Daten. WĂ€hrend obig genannte Techniken dieser Arbeit fĂŒr einzelne Zeitpunkt sehr gut funktionieren, ist die effektive und effiziente Verwaltung von unsicheren rĂ€umlich zeitlichen Daten immer noch ein weitestgehend ungelöstes Problem. Diese Dissertation löst dieses Problem, indem unsichere rĂ€umlich-zeitliche Daten durch stochastische Prozesse modeliert werden. Auf diese stochastischen Prozesse lĂ€sst sich das oben genannte Paradigma anwenden, um unsichere rĂ€umlich-zeitliche Daten effizient anzufragen, zu indexieren, und zu minen

    Riesz-Quincunx-UNet Variational Auto-Encoder for Satellite Image Denoising

    Full text link
    Multiresolution deep learning approaches, such as the U-Net architecture, have achieved high performance in classifying and segmenting images. However, these approaches do not provide a latent image representation and cannot be used to decompose, denoise, and reconstruct image data. The U-Net and other convolutional neural network (CNNs) architectures commonly use pooling to enlarge the receptive field, which usually results in irreversible information loss. This study proposes to include a Riesz-Quincunx (RQ) wavelet transform, which combines 1) higher-order Riesz wavelet transform and 2) orthogonal Quincunx wavelets (which have both been used to reduce blur in medical images) inside the U-net architecture, to reduce noise in satellite images and their time-series. In the transformed feature space, we propose a variational approach to understand how random perturbations of the features affect the image to further reduce noise. Combining both approaches, we introduce a hybrid RQUNet-VAE scheme for image and time series decomposition used to reduce noise in satellite imagery. We present qualitative and quantitative experimental results that demonstrate that our proposed RQUNet-VAE was more effective at reducing noise in satellite imagery compared to other state-of-the-art methods. We also apply our scheme to several applications for multi-band satellite images, including: image denoising, image and time-series decomposition by diffusion and image segmentation.Comment: Submitted to IEEE Transactions on Geoscience and Remote Sensing (TGRS

    Consistent Device Simulation Model Describing Perovskite Solar Cells in Steady-State, Transient, and Frequency Domain

    Get PDF
    A variety of experiments on vacuum-deposited methylammonium lead iodide perovskite solar cells are presented, including JV curves with different scan rates, light intensity-dependent open-circuit voltage, impedance spectra, intensity-modulated photocurrent spectra, transient photocurrents, and transient voltage step responses. All these experimental data sets are successfully reproduced by a charge drift-diffusion simulation model incorporating mobile ions and charge traps using a single set of parameters. While previous modeling studies focused on a single experimental technique, we combine steady-state, transient, and frequency-domain simulations and measurements. Our study is an important step toward quantitative simulation of perovskite solar cells, leading to a deeper understanding of the physical effects in these materials. The analysis of the transient current upon voltage turn-on in the dark reveals that the charge injection properties of the interfaces are triggered by the accumulation of mobile ionic defects. We show that the current rise of voltage step experiments allow for conclusions about the recombination at the interface. Whether one or two mobile ionic species are used in the model has only a minor influence on the observed effects. A delayed current rise observed upon reversing the bias from +3 to -3 V in the dark cannot be reproduced yet by our drift-diffusion model. We speculate that a reversible chemical reaction of mobile ions with the contact material may be the cause of this effect, thus requiring a future model extension. A parameter variation is performed in order to understand the performance-limiting factors of the device under investigation

    Towards Mobility Data Science (Vision Paper)

    Full text link
    Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences. In this paper, we present the emerging domain of mobility data science. Towards a unified approach to mobility data science, we envision a pipeline having the following components: mobility data collection, cleaning, analysis, management, and privacy. For each of these components, we explain how mobility data science differs from general data science, we survey the current state of the art and describe open challenges for the research community in the coming years.Comment: Updated arXiv metadata to include two authors that were missing from the metadata. PDF has not been change

    A Hierarchical Spatial Network Index for Arbitrarily Distributed Spatial Objects

    No full text
    The range query is one of the most important query types in spatial data processing. Geographic information systems use it to find spatial objects within a user-specified range, and it supports data mining tasks, such as density-based clustering. In many applications, ranges are not computed in unrestricted Euclidean space, but on a network. While the majority of access methods cannot trivially be extended to network space, existing network index structures partition the network space without considering the data distribution. This potentially results in inefficiency due to a very skewed node distribution. To improve range query processing on networks, this paper proposes a balanced Hierarchical Network index (HN-tree) to query spatial objects on networks. The main idea is to recursively partition the data on the network such that each partition has a similar number of spatial objects. Leveraging the HN-tree, we present an efficient range query algorithm, which is empirically evaluated using three different road networks and several baselines and state-of-the-art network indices. The experimental evaluation shows that the HN-tree substantially outperforms existing methods

    Towards a Better Understanding of Public Transportation Traffic: A Case Study of the Washington, DC Metro

    No full text
    The problem of traffic prediction is paramount in a plethora of applications, ranging from individual trip planning to urban planning. Existing work mainly focuses on traffic prediction on road networks. Yet, public transportation contributes a significant portion to overall human mobility and passenger volume. For example, the Washington, DC metro has on average 600,000 passengers on a weekday. In this work, we address the problem of modeling, classifying and predicting such passenger volume in public transportation systems. We study the case of the Washington, DC metro exploring fare card data, and specifically passenger in- and outflow at stations. To reduce dimensionality of the data, we apply principal component analysis to extract latent features for different stations and for different calendar days. Our unsupervised clustering results demonstrate that these latent features are highly discriminative. They allow us to derive different station types (residential, commercial, and mixed) and to effectively classify and identify the passenger flow of “unknown” stations. Finally, we also show that this classification can be applied to predict the passenger volume at stations. By learning latent features of stations for some time, we are able to predict the flow for the following hours. Extensive experimentation using a baseline neural network and two naïve periodicity approaches shows the considerable accuracy improvement when using the latent feature based approach

    A Hierarchical Spatial Network Index for Arbitrarily Distributed Spatial Objects

    No full text
    The range query is one of the most important query types in spatial data processing. Geographic information systems use it to find spatial objects within a user-specified range, and it supports data mining tasks, such as density-based clustering. In many applications, ranges are not computed in unrestricted Euclidean space, but on a network. While the majority of access methods cannot trivially be extended to network space, existing network index structures partition the network space without considering the data distribution. This potentially results in inefficiency due to a very skewed node distribution. To improve range query processing on networks, this paper proposes a balanced Hierarchical Network index (HN-tree) to query spatial objects on networks. The main idea is to recursively partition the data on the network such that each partition has a similar number of spatial objects. Leveraging the HN-tree, we present an efficient range query algorithm, which is empirically evaluated using three different road networks and several baselines and state-of-the-art network indices. The experimental evaluation shows that the HN-tree substantially outperforms existing methods

    Reverse k-Nearest Neighbor Search based on Aggregate Point Access Methods

    No full text
    We propose an original solution for the general reverse k-nearest neighbor (RkNN) search problem in Euclidean spaces. Compared to the limitations of existing methods for the RkNN search, our approach works on top of Multi-Resolution Aggregate (MRA) versions of any index structures for multi-dimensional feature spaces where each non-leaf node is additionally associated with aggregate information like the sum of all leaf-entries indexed by that node. Our solution outperforms the state-of-the-art RkNN algorithms in terms of query execution times because it exploits advanced strategies for pruning index entries
    corecore